Sequencing and Raw Sequence Data Quality Control ◾ 21
On Linux terminal, we can use FastQC non-interactively and later we will display the
generated reports on an Internet browser. But before running FastQC, it is important to
know about the “limits.txt” file in the “Configuration” directory. This file contains the
default values for the FastQC options, and we can use it to determine which report to gen-
erate. Use a text editor of your choice to open that file and study its content. In most cases,
no change is needed. At this point, we will change only
kmer ignore 1 to kmer ignore 0
Then, save the file and exit. This change is necessary to include k-mer report when we run
the program.
The following is a simple syntax for running the FastQC program non-interactively on
the command line:
fastqc seqfile1 seqfile2 .. seqfileN
The input can be a single FASTQ file name or multiple file names separated by whitespaces.
The FastQC program has several options that can be displayed using the following
command:
fastqc --help
Since we have downloaded the eight E. coli raw FASTQ files above and stored them in the
“fastQC” directory, we can either run the program for each file or provide all file names
as input as shown in the above syntax. However, the efficient way is to use the bash com-
mands if we are using a Linux/Unix platform. The following bash script creates a directory
“qc”, changes to “fastQC” directory where the FASTQ files are stored, stores the file names
in a variable “filename”, then runs the FastQC program non-interactively, and finally saves
the QC reports in the “qc” directory:
mkdir qc
cd fastQC
filenames=$(ls *.fastq)
fastqc $filenames \
--outdir ../qc \
--threads 3
cd ..
We can also simply use the following command:
mkdir qc
cd fastQC
fastqc *.fastq --outdir ../qc --threads 3
The QC reports of the FASTQ files will be stored in the “qc” directory. FastQC will gen-
erate an HTML file “*_fastqc.html” and a zipped file “*_fastqc.zip” for each FASTQ file.